Statistical Section Segmentation in Free-Text Clinical Records

نویسندگان

  • Michael Tepper
  • Daniel Capurro
  • Fei Xia
  • Lucy Vanderwende
  • Meliha Yetisgen-Yildiz
چکیده

Automatically segmenting and classifying clinical free text into sections is an important first step to automatic information retrieval, information extraction and data mining tasks, as it helps to ground the significance of the text within. In this work we describe our approach to automatic section segmentation of clinical records such as hospital discharge summaries and radiology reports, along with section classification into pre-defined section categories. We apply machine learning to the problems of section segmentation and section classification, comparing a joint (one-step) and a pipeline (two-step) approach. We demonstrate that our systems perform well when tested on three data sets, two for hospital discharge summaries and one for radiology reports. We then show the usefulness of section information by incorporating it in the task of extracting comorbidities from discharge summaries.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Segmentation of Clinical Texts - Preliminary Results

Clinical narratives, such as radiology and pathology reports, are commonly available in electronic form. However, they are also commonly entered and stored as free text, and knowledge of their structure is necessary for enhancing the productivity of the healthcare departments and facilitating research. This paper presents a preliminary study attempting to automatically segment medical reports i...

متن کامل

Maintenance of a Computerized Medical Record Form

Structured entry forms for clinical records should be updated to take into account the physicians' needs during consultation and advances in medical knowledge and practice. We updated the computerized medical record form of a hypertension clinic, based on its previous use and clinical guidelines. A statistical analysis of previously completed forms identified several unnecessary items rarely us...

متن کامل

A Framework for Clustering Massive Text and Categorical Data Streams

Many applications such as news group filtering, text crawling, and document organization require real time clustering and segmentation of text data records. The categorical data stream clustering problem also has a number of applications to the problems of customer segmentation and real time trend analysis. We will present an online approach for clustering massive text and categorical data stre...

متن کامل

A Pragmatic Approach to Summary Extraction in Clinical Trials

ClinicalTrials.gov, the National Library of Medicine clinical trials registry, is a monolingual clinical research website with over 29,000 records at present. The information is presented in static and free-text fields. Static fields contain high-level informational text, descriptors, and controlled vocabularies that remain constant across all clinical studies (headings, general information). F...

متن کامل

A Modified Character Segmentation Algorithm for Farsi Printed Text Using Upper Contour Labelling

In this paper, a modified segmentation algorithm for printed Farsi words is presented. This algorithm is based on a previous work by Azmi that uses the conditional labeling of the upper contour to find the segmentation points. The main objective is to improve the segmentation results for low quality prints. To achieve this, various modifications on local baseline detection, contour labeling an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012